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ABSTRACT 

Investigating  Hopfield's  model  of  associative  memory  implementation  by  a 
neuial  network,  led  to  a  generalised  potential  system  with  a  much  superior  per¬ 
formance  as  an  associative  memory.  In  particular,  there  are  no  spurious  memories, 
and  any  set  of  desired  points  can  be  stored,  with  unlimited  capacity  (in  the  con¬ 
tinuous  time  and  real  space  version  of  the  model) .  There  are  no  limit  cycles 
in  this  system,  and  the  size  of  all  basins  of  attraction  can  reach  up  to  half 
the  distance  between  the  stored  points,  by  proper  choice  of  the  design  parameters. 

A  discrete  time  version  with  its  state  space  being  the  unit  hypercube 
is  also  derived,  and  admits  superior  properties  compared  to  the  corresponding 
Hopfield  network.  In  particular  the  capacity  of  any  system  of  N  neurons,  with  a 
fixed  desired  size  of  basins  of  attractions,  is  exponentially  growdng  with  N  and 
is  asymptotically  optimal  in  the  information  theory  sense.  The  computational 
complexity  of  this  model  is  slightly  larger  than  that  of  the  Hopfield  memory, 
but  of  the  same  order. 

The  results  are  derived  under  an  axiomatic  approach  which  determines  the 
desired  properties  and  shows  that  the  above  mentioned  model  is  the  only  one 


INTRODUCTION 


J.  Hopfield's  suggestion  of  a  neural  network  model  for  associative  memories 
in  [1],  arose  the  interest  of  many  scientists  and  led  to  an  effort  of  mathe¬ 
matically  analysing  its  properties  [2-12].  It  is  simple  to  implement  this  model, 
but  hard  to  intuitively  capture  its  properties,  and  even  harder  to  rigouriouslv 
analyze  its  performance  as  an  associative  memory.  This  is  perhaps  the  main 
reason  for  its  attracting  such  interest. 

The  yet  partial  analysis  done  on  this  preliminary  model  revealed  the  follow¬ 
ing  major  disadvantages: 

(a)  There  are  many  spourious  memories  generated  at  unexpected  places 
(c.f.,  [3,7,8,9,11]),  which  attract  a  major  part  of  the  inputs. 

(b)  The  capacity  of  the  various  versions  of  this  model  is  bounded  by  N 
(the  number  of  neurons),  (c.f.,  [3,5,6,7,8,10]),  which  is  quite  a  poor  capacity 
compared  to  the  Information  Theory  bounds  on  error  correcting  codes.  Some 
suggestions  as  to  how  this  bound  can  be  enlarged  appear  in  [12]  but  supply  only 
a  partial  answer. 

(c)  Not  only  that  the  capacity  is  limited,  it  is  context  dependent,  i.e., 
there  are  very  small  sets  of  memories  which  cannot  be  stored  in  the  original 
Hopfield  model,  and  the  shape  of  a  basin  of  attraction  depends  on  far  away 
attractors  (c.f.,  [6]). 

This  motivated  another  suggestion  of  a  continuous-time  model  with  evolution 
by  N  Ordinary  Differential  Equations  (ODE's),  (c.f.,  [13]).  The  new  model  is 
reported  experimentally  to  have  better  performance,  although  it  still  suffers 
from  some  of  the  drawbacks  of  its  ancestor.  Furthermore,  a  rigourious  analysis  of 
the  ODE  version  seems  to  be  almost  impossible. 


2 


In  this  work  we  take  a  different  approach.  We  start  by  assuming  only  the 
generic  form  of  our  model  for  associate  memory,  and  derive  its  structure  and 
properties  out  of  a  set  of  assumptions  on  the  system. 

The  basic  model  we  have  in  mind  is  of  a  set  of  memories  ("particles",  in 
classical  mechanics  or  electrostatics,  of  specified  "charge/mass"),  which  in 
the  simplest  case  are  located  in  the  location  of  the  desired  memories.  In  gen¬ 
eral  we  allow  for  "spread  out"  charges  (i.e.,  charge/mass  densities  instead  of 
6-functions) .  To  each  such  "particle"  a,  we  therefore  associate  its  "charge 
density"  y  .  In  general  (unlike  in  classical  mechanics),  we  allow  for  various 
"types"  of  particles  (memories),  i.e.,  the  potential  associated  with  each  particle 
may  be  different  (and  this  will  affect  the  basins  of  attraction'  shape).  Indeed, 
the  following  three  properties  would  be  desirable  for  an  associative  memory 
model : 

(PI)  The  system  should  be  invariant  to  translations  and  rotations  of  the 
coordinates . 

(P2)  The  system  should  be  linear  w.r.t.  adding  particles  in  the  sense  that 
the  potential  of  two  particles  should  be  the  sum  of  the  potentials  induced  by 
the  individual  particles  (i.e.,  we  do  not  allow  inter-particles  interaction;  see 
however,  the  discussion  in  Section  IV) . 

(P3)  Particle  locations  are  the  only  possible  sites  of  stable  memory 
locations . 

In  order  to  state  our  results,  we  need  to  define  exactly  our  system  and  describe 

how  we  build  a  memory  out  of  the  desired  specifications. 

N 

In  what  follows,  we  take  1R  to  be  our  state  space,  and  use  first  order, 
potential  type  ODE's  to  avoid  the  kind  of  "kinetic  equilibria"  one  finds  in 
second  order  equations,  i.e.,  the  equations  of  motion  are: 

x  =  -VV(x)  (1) 


y,yw 
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where  V(x)  is  our  potential  and  V  stands  for  the  gradient  operator.  Since  we 
want  to  allow  for  various  particle  types,  let  us  define  a  "type-space"  A.  A 

N 

may  be  finite,  countable,  or  even  non-discrete,  but  clearly  A  is  smaller  than  IR' 
and  therefore  finite  dimensional.  We  assume  that  A  is  a  measurable  space  so  that 
integration  over  A  is  well  defined.  The  specific  examples  for  A  that  we  have 
in  mind  are  a  finite,  discrete  set  {1,2,...,K}  or  UV*  itself. 

Our  memory  building  process  is  defined  as  a  transformation  V(x)  =  T(y(*,-)) 


»\ 


I 


where 


N 


y  €  M(  IR*  x  A)  , 


V  JyT 

and  M(  IR*  x  A)  stands  for  the  space  of  measures  over  IR*  x  a  (which  is  clearly 


a  linear  space).  For  example,  assume  we  want  a  potential  V  (x)  for  a  single 

X0 


particle  of  type  a  (a  €  A)  located  at  x^.  Then 


V  (x)  =  TCI  *  1  ) 
0  xo  a 


(PI)  -  (P3)  now  read 


(AT)  T(y(  1RN  x  a))  =  V(x)  T(y(c  ]RN  +  n)  *  A))  =  V(cx  +  n)  where 


N  NxN  N 

n€lR,c€lR  is  a  nonsingular,  orthogonal  matrix,  and  c  1R‘  +  n  is  the  c- 


„N 


rotation,  n-translation  of  IR  . 

(A2)  T  is  a  linear  transformation  over  M,  i.e.. 


,N 


T(y(  IR  x  A))  = 


TRNxA 


f(x,y,a)y(dy  x  da) 


(2) 


where  f(x,y,a)  is  the  kernel  of  the  transformation  (Green's  function)  and  we 
assume  that  f(>,,,‘)  is  such  that  (2)  makes  sense. 

(A3)  Let  V(x)  *  T(y(  3RN  x  a)).  Let  D  =  {(x,a)  |y(dx,a)  7  0},  where  FT 


denotes  the  closure  of  a  set.  Then  V(x)  does  not  possess  minima  outside  of 


,N 


D|  ,,,  where  D|  N  is  the  restriction  of  D  to  IR  . 
IR*  IR 


In  addition,  we  assume  throughout  the  necessary  smoothness  and  growth  condi¬ 
tions  on  f(x,y,a)  and  its  derivatives  (w.r.t.  x,  y) .  In  particular,  we  assume 
that  if  f(x0,y0,a  )  is  finite  then  f(x,y,ag)  is  twice  continuously  differentiable 
w.r.t.  x  at  (x,Vq)  ,  Vx  j*  y^ ,  with  integrable  derivatives  w.r.t.  y(-,-). 

Our  first  result,  which  is  proven  in  the  appendix,  is  the  following  structure 
theorem: 

Theorem  1 .  V(-)  satisfy  (Al)  -  (A3)  in  a  universal  manner  w.r.t.  every  y  €  M  iff 

it  is  of  the  structure: 

v(x)  »  J  f  (| | x-n | l2)y(dn  *  da)  (3) 

xA 

2  v 

where  Va  6  A,  f  (||x-n||  )  is  a  sub-harmonic  function  on  IRA  -  (n),  i.e., 

Vd  >  0  f£(d)d  +  j  f^(d)  <0.  (4) 


Solving  (4)  under  the  assumptions  that  f^(d0)  t  0,  for  some  dQ  >  0,  and 


f;cd0)d0 

adding  the  proper  constant  so  that  f  (d^)  = - fj -  »  (with  the  details  given 


(§  -  1) 


in  the  Appendix) ,  we  obtain  for  N  >  3: 


Lemma  1 .  Any  solution  of  (4)  can  possess  at  most  one  local  maxima  (and  no  local 
minima),  for  d  6  (0,°°),  and  satisfies: 

-(-  -1) 

f»ld)  i  yV^)  2  •  f°rd>d0  (5a) 

-(-  -1) 

y®  i  w(^  2  •  f°rdido-  <sb> 

Remarks .  1.  By  local  minima  (maxima)  we  mean  a  strict  minima  (maxima).  For 

strictly  sub-harmonic  functions,  we  can  assure  the  also  non-existence  of  non- 


strict  extrema. 
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2.  Equality  in  equations  (5)  for  every  d  >  0  implies  equality  in  (4),  which 

implies  that  f  (||x-n||^)  is  an  harmonic  function  on  -  (n).  Whenever  this 

holds  Va  £  A,  not  only  there  are  no  strict  local  minima  of  V(x)  outside  d|  ,  but 

IR 

there  are  also  no  strict  local  maxima  of  V(-)  at  those  points. 

For  this  particular  case  (where  for  simplicity  we  use  dQ  =  1,  w.l.o.g.): 


where 


V(x)  = 


,  A 
U(*)  = 


,N 


I x-n 1 1  ^N_2^u(dn) 


ne  ir‘ 


(6) 


aEA 


fa(l)p(-,da), 


is  a  signed  measure  on  IR  . 

The  derivation  of  the  representation  of  V(-)  was  done  under  the  most  gen¬ 
eral  condition.  Usually  however,  one  is  interested  in  an  associative  memory  with 
a  discrete  number  of  memories,  possibly  of  different  types.  Let  therefore 
A  =  { 1 , . . .  ,K } ,  where  K  is  the  number  of  particles,  and  let  the  a-th  memory  be 

located  at  u^  €  1R^,  i.e.,  p(n,cQ  =  1  r  i  x  1  ^  •  he  concentrate  from  now 

q=u^a'  — 

on  on  this  class  of  systems,  which  is  represented  by 


V(x)  =  l  fa(| 


a€A 


x-u 


(a) 


(7) 


For  example,  when  we  combine  (6)  and  (7),  we  obtain: 


V (x)  =  l 


V1) 


a£A  ||x-u^||(N‘2) 


(8) 


which  is  exactly  the  Electrostatical  potential  in  the  particular  case  of  N  =  3. 

For  any  value  of  N  >_  3,  and  V(-)  given  by  (7),  with  f  (•)  satisfying  (4), 

we  distinguish  between  two  types  of  memories: 

(A)  Attractive  memories,  with  f  (•)  monotonical ly  increasing  at  least  at 

some  dQ  <  in  which  case  lim  f  (d)  =  -00  (from  (5b)),  and  from  (5a)  we  note 

d-0  a 


*  mj"  *  n  -  *  V*  “S,"  .  "m*1  •„*  ♦  ^  ^  ^  ^  ,  -  „  -  » 
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that  (at  least  when  f  (•)  is  non-decreasing  everywhere),  the  potential  induced  by 

(a) 

the  memory  at  u  should  approach  rapidly  its  limiting  value  as  (d/d^)  increases. 
For  those  memories: 

lim(a)  =  (9a) 

x-nr  J 

so  that  they  are  the  global  (and  also  local)  minima  of  the  potential  function, 

and  correspond  to  relatively  short-range  interactions. 

(B)  Repulsive  memories,  with  f  (•)  monotonical ly  non-increasing,  in  which 

case  lim  f  (d)  >  -°°. 
d-°  a 

In  that  case,  u^  J  is  not  a  strict  local  minima  of  V(-)>  and  two  behaviors 
are  possible: 

(Bl)  lim  f  (d)  =  °°,  lim  f  (d)  =  const.  An  example  is  the  electrostatic 
d+O  d-*“ 

force  of  a  negative  particle  (instead  of  the  positive  one  in  (A)).  Those  are  rela¬ 
tively  strong,  short-range  interactions,  and  are  of  interest  if  one  wishes  to 
"avoid"  specific  locations  (as  now  u^  is  a  strict  global  maxima  of  V(*))- 

(B2)  lim  f  (d)  =  0,  (and  usually,  lim  f  (d)  =  -») .  An  example  is  f(d)  =  -d 
d-*-0  d-** 

("repulsive  spring") ,  and  those  forces  are  weak  in  the  short  range  but  strong  in 
the  long  range.  We  do  not  use  those  kind  of  repulsive  memories  in  the  sequel. 

In  particular,  for  the  Electrostatical  form  of  the  potential  given  in  (8), 
v(0  is  an  harmonic  function  outside  {u^}  thus  possess  all  its  local  minima 
in  the  attractive  memories,  and  all  its  local  maxima  in  repulsive  memories  of 
type  (Bl).  Thus,  we  can  store  in  the  same  system  two  kinds  of  objects.  While 
the  recall  process  using  (1),  will  give  rise  to  objects  of  type  (A),  the  same 
recall  process  with  -V(’)  instead  of  V(-)  will  give  rise  to  objects  of  type  (Bl)  . 

The  potentials  given  by  (4)  and  (7),  possess  the  major  property  one  expects 
from  an  associative  memory.  The  desired  memories  are  arbitrarily  chosen,  with 
their  recall  being  guaranteed,  and  their  number  and  distribution  unrestricted. 
Furthermore,  our  assumption  (A3),  together  with  the  properties  of  the  potential 


«> »'■ 
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type  ODE’s  in  (1)  (c.f.  [14])  guarantee  that  except  for  a  set  of  measure  zero  of 

fa) 

saddle  points,  every  initial  probe  x(0)  will  converge  to  a  desired  memory  u^  ' 
of  type  (A) . 

Our  assumptions,  (Al)  and  (A2) ,  made  the  mathematical  analysis  tractable. 
When  some  are  omitted  the  class  of  potentials  with  property  (A3)  is  enlarged. 

For  example  without  (Al)  we  obtain  (3)  with  f^fx.n)  instead  of  f  (||x-n||  ),  and 
(4)  is  replaced  by: 


vn  6 


irn  Vx  €irn'  -  (n) 


AxfaCx.n) 


<  0 


(10) 


where  Ax  stands  for  the  Laplacian  operator  w.r.t.  x.  This  corresponds  to  a  "non- 
homogeneous"  state  space,  but  complicates  the  mathematical  analysis. 

To  compare  our  class  of  "neural  networks"  with  the  model  of  [1],  as  well  as 
the  Information  Theory  bounds  on  error  correcting  codes,  we  derive  the  discrete¬ 
time  finite  state  space  analogue  of  the  evolution  (1) . 

Consider  the  state  space  as  the  unit  hypercube  inBV',  to  be  denoted  by  HZ’. 
For  any  potential  function  V(x) ,  the  relaxation  algorithm  is  (in  the  spirit  of 
[12]): 

(A)  According  to  some  predetermined  probability  measure  peak  a  point 

N  X 

y  6  H  having  Hamming  distance  one  from  the  current  state  x  €  H‘  . 

(B)  If  V(y)  <  V(x),  then  the  new  state  will  be  y,  otherwise  it  remains  x. 

In  both  cases  return  to  step  (A) . 

As  shown  in  [12],  for  any  V(-)  and  x^  ,  this  algorithm  converges  to  a 

N  (a)  X 

fixed  point  in  H‘  .  For  any  practical  memorv  of  this  type,  {uv  c  H" ,  and 

'  afcA 

thus  A  is  a  finite  set  with  K  =  ] A ]  <_  2^'. 

Whereas  the  class  of  memories  suggested  here  is  of  the  form: 

K 

V(x)  =  l  f  (| |x-u(l)  |  I2)  (11) 

i=  1  1 


with  f\(*)  satisfying  (4),  the  model  suggested  in  [1]  corresponds  to  (11)  with: 
f^(d)  =  y[N  -  (N  -  -|d)2]  which  does  not  satisfy  (4). 

Note  that  the  more  complex  versions  of  this  model  (c.f.,  [5,6,8]),  does 
not  satisfy  assumptions  (Al) ,  (A2)  at  all. 

In  the  next  section  we  analyte  the  continuous-time  model  (the  ODE's  evolu¬ 
tion),  in  terms  of  the  basins  of  attraction,  and  convergence  rate  analysis.  In 
section  III,  the  discrete  time  version  is  analyzed.  The  capacity  (K) ,  is  re¬ 
lated  to  the  error  correction  capability,  and  compared  with  known  results  on 
Hopfield's  model.  The  last  section  is  devoted  to  rough  complexity  analysis  for 
both  models,  as  well  as  a  comparison  with  the  classical  Hamming  classifier  (using 
minimal  distance  search),  and  to  possible  generalizations  of  (1)  which  allow  for 
more  complicated  tasks  as  clustering,  supervised  learning,  etc. 


II.  BASINS  OF  ATTRACTION  AND  CONVERGENCE  RATE 


The  potential  given  by  (7)  usually  allows  for  an  infinite  number  of  distinct 

stable  states  (memories),  thus  having  infinite  capacity.  This  however  does  not 

reveal  the  shape  of  the  basins  of  attractions  of  these  memories. 

For  analyzing  the  performance  of  the  system  in  (7)  as  an  associative  memory, 

assume  the  simplified  assumptions  that  f  (•)  is  a  monotonically  nondecreasing 

fctl 

function,  independent  of  a  and  that  [  [u  -  u  |  |  _>  1  ,  for  every  a  /  £  E  A. 

We  shall  investigate  the  value  of: 

emax  ^  min  |max  { p ;  s.t.  |  |  x(0)  -u^  |  |  £  p  implies  x(t)  -+  u^  }  j.  (12) 

rn(oO-i  L  P  t-«°  J 

1U  'a£A 

It  is  clear  from  symmetry  arguments  that  e  £  1/2  (where  the  outer  minimi za- 

Tex')  v 

tion  is  over  all  possible  positions  of  {u  '}  c.  in  1R")  . 

atA 

Our  aim  is  to  show  that  a  proper  choice  of  f(-)  (which  is  sub-harmonic)  will 

lead  to  e  as  close  to  1/2  as  desired,  thus  the  "maximal''  basin  of  attractions 

max 

can  be  guaranteed. 

The  following  lower  bound  on  c  is  derived  in  the  Appendix,  by  bounding 

IT13.X 

1 

the  maximal  contribution  of  farther  away  particles  to  forces  in  the  boundary  of 

(a) 

the  e  sphere  around  u  : 
max  r 


Lemma  2.  (a),  e _  is  larger  than  any  value  of  r  <  1  satisfying: 

^ ^ —  II13.X 


f 1 (r  ) r  > 


~  {(t-r)f'((t-r)2)}(2t+l)Kdt. 


(13) 


(b).  This  is  a  tight  bound  in  the  sense  that  whenever  the  r.h.s.  of  (13) 

diverges  then  e  =0. 

max 

We  shall  now  restrict  our  attention  to  f(d)  =  -k(-^-)’m  with  m  >_  (^  -  1)  , 

d0  2 

integer, (this  guarantees  that  f(-)  satisfies  equation  (4)). 
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L..  ' 


N  1 


The  r.h.s.  of  (13)  is  finite  iff  m  >  —  -  and  then,  for  integer  m,  (13)  is 
exactly: 


N  r^\ 

/-i  -(2m+l)  ^  ,,  ,m.  _N  . -(2m+l)  r  '-kJ  r2  .  .k 

(km  dQ)r  >  (km  dQ)  3  (l-r>  J,  [j  (1-r)] 


(14) 


k=0  (p 


,-l 


which  leads  to 

7±£max(m’N)  i{*  *  [l3tX*»]1/(2"*1)}  (IS) 

so,  for  any  N,  lim  {e  (m,N)}  =  1/2,  and  for  any  m  =  4(k(N+l)  -  1)  with  k  >  1 

Jud  X 


m-*«> 


fixed,  lim  {e  „v(4(k(N+l)  -  1),N)}  =  1/(1  +  /3)  .  So  even  for  m  =  K/2, 

max  z 


1 


emax<'m,N-)  ~  T’  for  N  lar2e  enough. 


Remarks .  1.  If  the  {u^  p  are  restricted  to  be  contained  in  a  sphere  of  radius 


a£A 


p  in  ]R‘  ,  then  the  integral  in  the  r.h.s.  of  (13)  will  have  upper  limit  2p,  and 


2  N 

the  additional  term  (2p-r) £' ((2p-r)  )(4p+l)  would  be  added  there.  It  is  thus 


N 


finite  for  every  value  of  m  (including  the  harmonic  case  m  =  —  -  1) ,  implying 


e  >0  under  this  restriction, 
max 


2.  For  N  -*■  00 ,  and  fixed  k,  the  1/(1  +  /3)  behaviour  is  maintained  even 

(a) 

when  we  consider  only  memories  {u^  ;  are  contained  in  the  unit  sphere 

(p  =  1) ,  as  a  refinement  of  the  arguments  of  Lemma  2  shows. 

As  for  the  rate  of  convergence,  it  is  easy  to  verify  that  for  a  large 


(a) 


value  of  m,  and  x(0)  far  from  all  the  {u  }a€A'  it:  wil1  take  a  lonS  time  for 


the  evolution  in  (1),  before  x(t)  will  be  near  one  of  the  {u^}  ...  However 

CxfcA 


(as  we  prove  in  the  Appendix) : 


,N 


Lemma  3.  Let  Q  be  any  closed  set  in  1R  whose  interior  includes  the  convex  hull 


>f  {u^}  ^  U  {u},  where  u  is  an  arbitrary  point  in  R‘N  (possibly  within  the 


convex  hull  of  {u^}^^)  .  Then,  adding  1  xgq  8(  I  !  x_ul  |  “)  to  V(x)  ,  where  g(x) 


is  any  nondecreasing,  differentiable  function  will  not  disturb  (A3),  nor  create 
additional  fixed  points  to  (1),  provided  all  the  f  (•)  which  compose  V(x)  are 
monotonical ly  increasing. 

g 

Therefore,  if  for  example  g(d)  =  d  is  added  (with  8  >  1) ,  then  the  conver¬ 
gence  from  x(0)  at  infinity  to  a  point  with  squared  distance  d^  from  the  points 

(a)  — 

(u  ; and  u  (with  d^  much  larger  then  the  squared  distances  between  these 
I A |  +  1  points),  takes  the  time  T  ~  d” ^ 1 ^ /4 6 ( 6- 1 ) .  Thus,  by  using  8,  large 
enough,  convergence  from  infinity  to  3Q  (the  boundary  of  Q)  can  take  arbitrarily 
small  time. 

Global  investigation  of  the  rate  of  convergence  inside  the  convex  hull  of 
f  cO 

{u^  1  is  quite  cumbersome.  Thus,  let  us  restrict  again  the  discussion  to  the 
case  where  |  |u^  -  [  |  >_  1,  f(d)  =  -k(-p)"111  (with  m  >_  is  an  integer). 

Furthermore,  let  x(0)  satisfy  |  ]  x(0)  -  u^  |  |  <_  8e(m,N)  where  6  <  1,  and  e(m,X) 

is  the  lower  bound  on  e  (m,N)  given  by  the  r.h.s.  of  (15). 

ins  x 

We  have  seen  already  that  for  every  0  <  1,  x(t)  -*•  u^  ,  but  (as  we  prove 

£-K» 

in  the  Appendix) : 

Lemma  4 .  Under  the  above  conditions  x(T)  =  u^  ,  where: 


d0\  _  [8(1  -  e(m,N)) 

2mk '  ^  *-(1  -  6c(m,N)) 


n2m+li -1 

J  }  - 


(16) 


So  that  for  m  large  enough,  =  e(m,N) ,  and  k  =  dg/2m,  we  obtain  log  T  ~ 
2(m+l)log  0.  Again,  by  enlarging  m  while  preserving  8  fixed,  T  can  become 
arbitrarily  small. 


To  conclude  -  the  "maximal"  basins  of  attraction  can  be  guaranteed  by  en¬ 
larging  m  (choosing  strong,  short  range  interactions),  and  this  will  also  speed 
up  the  convergence  within  these  basins  of  attraction  (which  is  completed  in  an 


III.  DISCRETE  TIME  EVOLUTION  ON  THE  UNIT  IIYPERCUBE 


Whereas  the  discrete  time  algorithm  presented  in  the  introduction  uses  V(-) 

(ex') 

which  has  no  local  minima  outside  {u^  ^  ix-  might  possess  fixed  points  out  of  th 

set.  The  reason  for  that  is  the  "rigidity"  of  the  algorithm  which  might  not  allow 
descent  in  the  gradient  direction,  due  to  the  limited  search  for  lower  potential 
only  in  the  Hamming  distance  one  neighborhood  of  every  x  £  H^'. 

Ke  can,  however,  show  that  the  proposed  class  of  potentials  is  optimal 
according  to  the  Information  Theory  bounds  (as  N  -*•  °°)  ,  and  in  particular  can  be 
used  to  design  error  correcting  codes  with  positive  rate  (c.f.,  [15]). 

Let  us  restrict  the  discussion  to  potentials  of  the  form  (11),  with  f(d)  = 

-d  m,  m  _> ^  -  1.  For  simplicity  of  notation  we  use  the  normalized  Hamming  dis- 

i  N 

tance  ||x-y||  =  —  |x.-y.|,  so  that  H‘  is  contained  in  the  unit  sphere,  and 

i=l  1  1 

assume  that  Vi  ^  j  ,  |  |u^  -  u^  |  j  _>  2o,  with  y  _>  p  >  ^-,  fixed.  Therefore, 
these  code  words  {u^}^_j,  can  tolerate  up  to  pN  errors  in  N  coordinates. 

As  we  prove  in  the  Appendix,  by  bounding  the  total  "force"  the  farther  away 
particles  apply  on  x(0)  ,  we  obtain: 

Theorem  2.  For  every  x(0)  such  that  |  |x(0)  -  u^  |  |  <_  20p,  y  6  0,  and: 


(K-l)  <  ( 


¥) 


2m 


r 

, ,  2 . -2m 

i 

rH 

v  NDJ 

¥ 

2  .  -2m  , 

-  — )  -  1 

Np; 

(17) 


(a) .  The  discrete  time  algorithm  will  generate  a  sequence  of  states  {x(n))n_^, 
such  that  Vn,  ||x(n)  -  u^  |  |  <_  (jx(n-l)  -  j  |  with  equality  iff  x(n-l)  =  x(n)  . 

(b) .  There  are  exactly  [ x C 0)  -  u^  |  j  distinct  states  in  this  sequence, 
and  if  each  coordinate  has  positive  probability  to  be  chosen  as  the  updated 
coordinate,  then  x(n)  converges  to  u^  with  probability  one. 
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Consider  now  p  >  0,  and  6  <  fixed,  with  m  _>  t{1o22  +  c(^^)}  (where  e  >  0 
is  arbitrarily  small).  For  this  case,  if  N  is  large  enough  the  r.h.s.  of  (17)  is 
larger  than  2  ,  thus  K  is  determined  only  by  the  bounds  on  the  maximal  number 
of  points  in  HX  satisfying  Vi  f  j ,  j  lu*'1'1  -  u^"1  ||  >_  2p,  i.e.,  Information  Theory 
asymptotic,  sphere  packing  bounds  for  error  correcting  codes  (c.f.,  [15]). 

To  conclude,  Theorem  2  guarantees  that  for  short  range  forces  (i.e. ,  m(Nf) 

large  enough),  and  large  enough  dimension,  direct  convergence  (c.f.,  [3]  for  this 

definition)  to  the  nearest  code  word  is  obtained,  independently  on  the  number  of 

code  words  and  their  locations,  provided  that  its  initial  distance  is  smaller  than 
1 

—  min 

For  comparison,  for  the  model  of  [1]  which  has  "strong"  forces,  the  maximal 
number  of  memories  is  bounded  above  by  N  even  for  6=0  (i.e.,  recall  with  no 
errors).  Thus,  this  model  has  zero  rate  when  referred  to  as  an  error  correcting 
code  (c.f.,  [16]).  Even  when  it  converges,  the  convergence  time  might  grow  ex¬ 
ponentially  with  N,  unlike  the  linear  time  guaranteed  by  Theorem  2  (when  proper 
selection  of  the  updated  coordinate  is  done) . 

Although  Theorem  2  was  derived  for  the  simplest  case  of  equal  desired 
basins  of  attraction,  it  can  be  generalized  to  other  situations. 
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IV.  LEARNING  AND  COMPLEXITY 


A.  Learning 

The  associative  memory  models  presented  in  the  preceeding  two  sections  are 
capable  of  both  storing  information  and  recalling  it.  The  analysis  was  restricted 
to  the  Euclidean  state  space,  and  the  unit  hypercube  (equipped  with  the  Hamming 
norm),  merely  for  simplicity  of  presentation,  and  to  enable  comparison  with 
Hopfield  models  (c.f.,  [1,13]). 

When  the  state  space  is  an  arbitrary  Riemannian  manifold,  the  evolution 
(1)  can  be  defined  more  abstractical ly  as  the  potential  ODE's  on  that  manifold, 
with  the  gradient  and  Laplacian  operators  in  (1)  and  (10)  being  defined  on  the 
manifold.  As  the  maximum  principle/Gauss  theorem  (c.f.,  [16]),  which  was  the 
key  to  Theorem  1,  is  valid  also  on  any  Riemannian  manifold,  most  of  the  results 
in  this  work  can  be  extended  to  this  more  general  context. 

As  for  the  discrete  time  version  of  the  algorithm,  it  can  be  easily  ex¬ 
tended  to  any  finite  graph  whose  vertices  are  embedded  in  1R’\  (c.f.,  [12]). 

The  process  of  storage  and  recall  of  information  described  in  this  work 
does  not  involve  any  learning  nor  generalization  (in  the  sense  of  [17]) .  It 
is  also  uncapable  of  creating  periodical  orbits  (as  done  for  example,  in  [4]). 
However,  by  further  generalizing  the  kinematical  laws,  one  can  incorporate  most 
of  these  phenomena. 

For  example,  periodical  orbits  can  be  generated  by  modifying  (1)  to  the 
more  "classical"  equation  of  motion: 


x  =  -  -  W(x)  -  Px 


(18) 
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which  for  N  =  3  is  just  the  Newtonian  motion  of  particle  with  mass  rr. ,  in  the 
field  of  the  potential  V(-)>  with  viscosity  coefficient  P. 

r  Ci  i 

Likewise,  learning  can  be  obtained  by  modifying  the  locations  of  {u^  J 
during  the  recall  operation,  either  as  a  response  to  the  distribution  of  the 
initial  states  x(0) ,  or  to  an  external  teaching  procedure,  or  by  adding  inter¬ 
particles  interactions.  These  modifications  can  be  implemented  within  the 
evolution  (1)  (or  (18)),  by  allowing  the  state  x(t)  to  be  represented  by  a  non- 

negligable  particle,  which  apply  forces  on  the  given  {u^  Generalization 

-  v  CitA 

(which  is  basically  a  spontaneous  creation  of  clustering)  is  easily  obtained 
(a) 

once  the  {u''  1  particles  are  allowed  to  apply  forces  one  on  the  other,  and 
change  their  locations. 

Of  course,  in  order  to  make  all  these  remarks  valid,  goals  should  be  defined 
rigorously,  and  mathematica/physical  rules  that  will  achieve  them  should  be  in¬ 
corporated  within  this  framework. 

We  conclude  this  subject  by  pointing  out  that  we  have  shown  that  there  is 
nothing  special  in  Spin-Glass  models,  and  other  known  models  in  phy:  '  and 
mathematics  possess  the ’femergent  collective  computational  abilities",  one' 
are  properly  interpreted. 


B .  Complexity 

Our  proposed  models  have  better  performance  than  the  models  in  [1,13],  but 
what  about  the  implementation  complexity? 

For  comparison  purposes  we  deal  with  three  algorithms  on  .  The  first  one 
is  the  classical  Hamming  decoder  w.r.t.  to  {u^}^_j  c  h\  It  involves  the  par¬ 
allel  computation  of  the  K  correlations  (u^,x(0))  (where  x(0)  C  as  well), 
followed  by  a  search  for  the  maximal  value,  implemented  in  a  tree  structure. 
Thus,  KN  multiplications  are  needed  together  with  K  log  K  comparisons  of  pairs 


V  V, 
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of  numbers,  and  the  delay  of  the  algorithm  is  log  K  +  1  "unit"  times  (where  com¬ 
parison  and  multiplication  are  assumed  equivalent  throughout) . 

The  second  algorithm  is  the  one  suggested  in  [1].  Each  iteration  involves 
KN'  multiplications  and  N  comparisons  of  pairs  of  numbers  (since  K  <_  N  for  this 
algorithm,  as  shown  in  [3,6,7,9,10]).  The  time  delay  however,  is  the  number 
of  iterations  ("full  sweeps",  as  all  X  coordinates  are  updated  asynchronously), 
which  is  believed  to  be  independent  of  K. 

-n 

The  last  algorithm  is  the  one  suggested  here  in  Eq .  (11),  with  f^(d)  =  -d 
It  involves  KN  multiplications  in  each  iteration  for  obtaining  the  d's.  The 
operation  of  f.(-)  is  quite  simple  once  done  by  an  analogue  computer:  One  diode 
takes  log  d,  then  multiplication  by  (-m)  is  done,  and  at  last  a  second  diode  com¬ 
putes  -exp[-m(log  d) ]  =  -d~m.  So  the  overall  complexity  is  again  determined  by 
the  KN  multipliers,  and  the  time  delay  (in  "full  sweeps")  is  again  a  small  con¬ 
stant  as  Theorem  2  implies. 

Thus,  for  K  <  N,  the  new  algorithm  has  the  same  complexity  as  Hopfield's 
scheme,  and  the  classical  Hamming  decoder,  with  smaller  time  delay  for  the  first 
two  algorithms. 

This  result  is  true  also  for  K  ~  2h("P^',  but  then  the  Hopfield  model  cannot 
be  used,  whereas  the  new  algorithm  has  complexity  which  is  linear  in  K,  i.e., 
exponential  in  N.  In  this  case  it  is  better  (in  time  delay)  then  the  classical 
Hamming  decoder,  but  does  not  admit  the  polynomial  complexity  of  some  of  the 
special  error  correcting  codes  used  in  coding  theory  (c.f.,  [15]). 
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APPENDIX 


Proof  of  Theorem  I . 

v 

We  use  assumption  (Al)  for  the  case  of  atomic  measures  on  A  *  IR‘  ,  i.e., 

V(x)  =  f(.x,pQ,0iQ)  .  Consider  first  a  translation  of  the  coordinates,  i.e., 

x'  =  x  +  A,  with  proper  translation  of  the  atomic  measure  p,  i.e.,  Pq  =  rig  + 

As  assumption  (Al)  implies  that  V(x')  =  V(x),  f(x,r)g>ciQ)  =  f(x+A,  rig+A,  ctg)  for 

every  r^.x.A  £  3R^  and  every  a  £  A.  Thus,  f(x,ng,ag)  depends  only  on  x-r[Q. 

Repeating  this  argument  for  the  case  of  rotation  of  the  coordinates  will  prove 

2 

that  f(.x,nQ»C!g)  depends  only  on  |  | x-tIq  |  [  for  every  ctg  6  A. 

Thus,  the  structural  assumptions  (Al)  and  (A2)  impose  that  V(-)  is  of  the 
form  given  in  (3) . 

We  now  assume  that  (4)  is  satisfied  for  every  a  C  A.  It  is  easily  verified 
that  (4)  is  equivalent  to: 

Vx  €IRN  -  (n)  Axfa(||x-n|  |2)  £0  (A .  1) 

where  Ax  is  the  Laplacian  operator  w.r.t.  x.  Equation  (A.l)  implies  in  view 

N 

of  (3),  that  for  every  neighborhood  U(x)  of  x  £  IR  , 


in  which  p(nxdct)  is  identically  zero, 

*  ctGA 

A  V(-)  £  0  (in  U(x)).  In  deriving  this  result  we  used  the  smoothness  assumption 

on  V,  together  with  integrability  assumption  on  A^ff-)  to  allow  for  changing  the 

order  of  differentiation  and  integration.  Suppose  now  that  f  (•)  satisfies  (4) 

Va  €  A,  but  there  exists  a  local  minima  at  x^  €  IR‘  with  U(Xq)  in  which 
* 

U(nxda)  =  0.  On  U(xn) ,  A  V(-)  <  0,  so  the  maximum  principle  implies  that  the 
J  ct£A  x 

minimum,  of  V(-)  in  any  closed  subset  of  U(Xq)  is  obtained  on  the  boundary  of 


U(Xq)  (c.f. ,  [IS]).  However,  since  is  a  local  minima  there  exists  a  small, 
closed  neighborhood  around  it  such  that  Y(Xq)  <  Inf  Y(x)  on  that  neighborhood, 
so  contradiction  is  obtained. 

Remark :  We  have  shown  that  Eq.  (4)  guarantees  that  assumption  (A3)  holds,  where 
we  interprete  as  local  minima  only  isolated  points .  Refinement  of  the  above 
arguments  leads  to  the  elimination  of  constant  surfaces  of  local  minima,  when¬ 
ever  strict  inequality  holds  in  (4) . 

To  complete  the  "only  if"  part  of  Theorem  1,  we  assume  that  (4)  does  not 
hold  for  oiq  £  A  and  dQ  >  0,  and  consider  the  case  of  Y(-)  generated  by  (3)  with 
y(dpxda)  being  an  atomic  measure  on  *  a  uniform  measure  on  the  sphere  of  radius 
^  in  IR^ .  We  shall  prove  that  in  this  case  there  is  a  spourious  local  minima 
of  V(-)  at  x  =  0,  which  contradicts  assumption  (A3),  as  dQ  >  0. 

2 

At  x  =  0,  l  [ x- n |  l  =  dQ  for  every  r\  on  the  sphere  of  radius  /d^,  which 
implies  that  AxV(*)  >  0  at  x  =  0  (since  (4)  and  therefore  (A.l)  dees  not  hold 
there).  The  continuity  of  AXV(-)  near  x  =  0  is  imposed  by  our  smoothness  as¬ 
sumption,  and  guarantees  that  there  is  a  spherical  neighborhood  U(0)  u'here 
AxV(0  >  0.  Since  the  measure  y  is  spherically  symmetric,  so  is  the  potential 
V ( ■ ) ,  (i.e.,  V(x)  depends  only  on  | ] x [ [ ) .  We  now  apply  the  maximum  principle  on 
the  concentric  spheres  contained  in  U(0) ,  and  see  that  in  any  such  sphere  the 
maximum  of  V(-)  is  obtained  only  on  the  boundary.  However,  the  spherical  sym¬ 
metry  of  V( • )  implies  it  is_  constant  on  these  boundaries ;  i.e.,  x  =  0  is  a  local 
minima  of  V(-)»  as  we  claimed  above.  o 

Proof  of  Lemma  1 .  Whenever  f^(d)  =  0,  (4)  implies  that  f^(d)  <  0,  so  that 
P(d)  can  cross  the  zero  level  only  once  in  d  £  (0,®),  and  with  f^(d)  <  0. 

Thus,  solutions  of  (4)  will  possess  at  most  one  local  maxima  and  no  local 


minima  in  (O,00).  It  is  easy  to  verify  that  (4)  is  equivalent  to 


,N72 


is  monotonical  ly  non-increasing  on  (0,°°) 


Thus : 


/do\N72 

TCd)  <  r  (d0)(yj  ,  for  -  >  d  >  dQ  >  0 


/do\N/2 

r(d)>r(d0)^J  ,  for0<d<dc 


Integrating  (A. 3a)  from  dQ  to  d  >  dn,  and  using  the  condition 


-  “O' 


(A. 


(A. 


CA. 


fa(d0)d0 

f^.(dn)  =  -  — ^ -  will  lead  to  (5a),  whereas  integrating  (A. 3b)  from  d  £  dQ 


cT  0J 

to  dg  will  lead  to  (5b),  for  N  >_  3.  Similar  results  can  be  obtained  for  N  =1,2 


cf-n 


but  are  less  interesting. 


Proof  of  Lemma  2.  (a)  Since  f(-)  is  sub-harmonic  (satisfies  (4)),  and  mono- 

tonically  nondecreasing  ( f ’ (d)  ^  0)  ,  it  follows  that  f(-)  atisfies  (4)  also  for 

N  =  1,  i.e.,  that  f'(r  )r  is  a  monotonical ly  nonincreasing  function  of  r  6  1R+ . 

For  every  r  >  0  such  that  |  (x-u^  |  |  <_  r  implies  (W,  x-u^)  _>  0,  independently 

of  the  locations  of  the  other  {u^}  memories,  the  evolution  (1)  will  mono- 

tonically  decrease  |  |x-u^  |  j  until  x(t)  =  u^  .  Thus,  any  such  r  is  a  lower 

bound  on  e  .In  view  of  (7)  : 

max  v  J 


77,'  '  ^  (a)  ,  ,  x-u(Ct))  =  l  f'ClIx- 

2 1 |x-ul  | |  6GA 


u 


(B)  |  I  2.  (x-u^  ,  x-u^-1' 


X  -  u 


(a) 


>  f'(||x-u(a)||2)||x-u(a)M  -  T  f’(|  |x-uCe)  |  |2)  |  |x-uCe)  J  J  > 


B/a 


>  f’(||x-u(a)||2)||x-u(a)||  -  l  f'((||u(a)-u(e)||-!|x-u(a)||)2) 

B/ot 


.rl  I  I  !  l  h  . 
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>  f'(r')r  -  Irt(L1+(n+1)£  -  L1+n£) f ' (( ( 1+ne) -r) 2) ( (1+ne) -r) 


n=0 


=  f'(r“)r  + 


l  L  (t  -t  _){f'((t  -r)  j (t  -r)  - 
,  y  t  n  n-r  n  '  J  v  n 

t  =l+ne,  n>l  n  . 
n  — 

-  f'«Vrr)2’<Vrr>J/(VVi’ 


(A. 4) 


m  .fa),. 


where  L  =  i  { £ ;  |  |u^; -u^  |  j  <  t,  B  +  a,  8  £  A}|,  e  >  0  is  an  arbitrary  constant. 


The  first  inequality  in  (A. 4)  comes  from  the  Cauchy-Schwartz  inequality,  the 


second  from  the  triangle  inequality  and  the  monotonicity  of  f'(r  )r  w.r.t.  r,  and 


the  third  from  the  condition  l|x-u  J]|  £  r  <  1  and  the  monotonicity  of  f'(r  )r. 


The  lower  limit  on  n  is  because  of  L  =  0,  t  <_  1,  and  the  last  equality  holds  due 
to  this  fact.  Since  r  <  1,  f ' ( (t-r) “) (t-r)  possesses  a  continuous  derivative  on 
t  6  [l,00],  and  is  measurable  (since  it  is  composed  of  a  countable  number  of  dis¬ 


crete  steps),  the  r.h.s.  of  (A. 4)  is  a  continuous  function  of  z  >  0  which  possesses 

•2'~  1  ^  Lt  ^  {f ' ((t  r) 2) (t-r) }dt ,  which  is  also  a 


the  limit  (as  e  -*■  0)  :  f'(r  )r  + 


lower  bound  on  the  l.h.s.  of  (A. 4),  where  in  the  derivation  we  assumed  that  this 


d  2 

integral  is  finite,  (i.e.,  at  least  lim  ^{f’((t-r)  )(t-r)}  =0).  In  case  it 

r  (al 

diverges,  the  same  analysis  can  be  done  on  {uv  ' which  are  restricted  to  be 


in  a  sphere  with  radius  p,  which  means  L  =  const,  for  t  >  2p,  and  then  the  l.h.s. 
of  (A. 4)  converges  (for  e  0)  to 


f ’ (r2) r  -  L2pf'((2p-r)2)(2p-r)  + 


r2p 


l 


Lt  It  f((t-r)2)(t-r)dt. 


Thus,  (13)  will  follow  from  the  inequality  L  £  (2t+l)\  since  f' ((t-r) 2) (t-r) 


is  monotonically  non-increasing. 


i  i  (a)  (6) 

However,  this  inequality  follows  from  the  condition  | )uv  -u^w 


>  1, 


Va  /  8,  as  the  N  dimensional  sphere  of  radius  (t  +  i)  ,  around  u^,  contains  at 


least  (L^+l)  disjoint  spheres  of  radius  1/2  each.  Comparing  the  volumes  of  the 
large  sphere  and  the  (Lt+1)  small  ones  we  obtain  the  desired  inequality. 


(t>)  Consider  the  case  when  the  r.h.s.  of  (13)  diverges.  Then  even  if  we 


consider  this  integral  with  lower  limit  T  >>  1  it  still  diverges  to  +°°.  A  well 

known  sphere  packing  result  is  that  there  exists  {uv  J  ^  such  that 

lim  [L  /(2t+l)‘V’j  >  6  >  0  (c.f.  ,  [22]).  For  these  {u^  }  c,  ,  the  last  line  in 

t  -*oo  1  atA 

(A. 4)  can  be  arbitrarily  large,  negative  numbers  for  small  e,  for  every  r  >  0, 

?  (a) 

as  f'(r  )r  is  finite.  Furthermore,  we  can  obtain  this  result  also  when  u 

( Q) 

is  at  the  origin  and  u  ,  6  t  a  are  all  at  the  upper  half  space  (i.e.,  the  first 

(6) 

coordinate  of  u  is  non-negative) .  Consider  for  any  r  >  0  the  state  x  with 

first  coordinate  equals  to  r,  and  the  rest  being  zero.  As  t  -*  »  the  distribution 
(a) 

of  the  (uL  J elements  in  an  infinitesimal  disk  between  the  spheres  of  radius 
t  and  (t  +  At)  becomes  spherically  uniform  in  the  upper  half  space;  therefore,  as 
[  i  f"/2  ] 

lim  < - - r-  cos  e  dA>  >  0  (where  dA  is  a  volume  element  on  this  disk, 

t~*>  llt(2t  +  l),W  - tt/ 2  J 

and  6  is  the  phase  w.r.t.  to  the  first  coordinate  axis),  for  the  chosen  x, 

(W,  x-u^)  =  -°°  provided  that  the  second  line  in  (A. 4)  diverges.  However,  we 
already  know  that  the  last  line  in  (A. 4)  diverges,  even  when  only  t  _>  T  >>  1 
is  considered,  and  for  these  values  of  t  =  J  |u^  -u^  |  |  ,  |  | u^ -u^  I  I  - 


| |uk  - x | |  +  | |u^  -x I  I ,  as  r  =  |  |x-uv  J  \  |  <<  t.  Thus,  both  the  second  and  the 

last  lines  of  (A. 4)  diverge  together. 

To  conclude,  we  have  shown  that  there  is  a  sphere  packing  construction 
(a) 

with  uv  =  0,  for  which  whenever  the  r.h.s.  of  (13)  diverges,  choosing  x(0) 
with  the  first  coordinate  arbitrarily  small  positive,  and  the  rest  of  them  zero, 
will  result  in  x  with  arbitrarily  large  positive  first  coordinate  and  the  rest 
of  them  zero  (using  symmetry  arguments),  so  that  x(t)  will  move  along  the  posi¬ 
tive  part  of  the  first  axis  and  never  converges  to  u^  =  0.  Thus,  c  = 


0 
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Proof  of  Lemma  3.  By  adding  lx^qg(  !  I x_u !  !  *")  to  V(-)  we  have  not  changed  Y(-)  or 
the  evolution  (1)  in  the  interior  of  Q.  Thus,  we  only  have  to  prove  that  there 
are  no  fixed  points  of  (1)  outside  Int  Q. 

Assume  that  x^  f.  Int  Q  is  a  fixed  point  of  (1)  ,  and  denote  by  C  cz  Int  Q  the 
convex  hull  of  {u^  1 }  ^  U  {u},  then  there  is  a  convex,  closed,  neighborhood  U(Xq) 
of  Xq,  such  that  U(Xq)  D  C  =  <*>  (as  C  is  a  closed  set).  Thus,  there  is  an  hyper¬ 
plane  isTthat  strictly  separates  C  and  U(Xq)  (which  is  also  compact);  let  n  denote 
the  vector  normal  to  towards  C.  Now,  on  U(Xq) : 


(x,n)  =  +  l  2f'(| |x-u(a) | |2)(u(a)-x,n)  ♦  lv*n  2g> ( |  | x-u \ | 2) (u-x ,n)  >  0 


a£A 


x£Q 


(A. 5) 


where  the  inequality  follows  from  the  monotonicity  of  the  f  («)'s  and  g  ( - )  »  and 
the  geometry  of  the  problem.  Thus,  in  particular  x  /  0  at  x^  f  U(Xq) ,  which 
contradicts  the  assumption  that  xQ  is  a  fixed  point  of  (1)  . 

Ke  have  also  shown  by  (A. 5)  that  there  is  a  drift  towards  C,  from  any  point 
x  l  C.  □ 


Proof  of  Lemma  4.  Let  us  define  R(t)  =  ||x(t)-u 


(a) 


,  then  for  evolution  (1) : 


(A. 6) 


R(t)  =  -  (W(x(t)),x(t)-u(c0) 

<  -2|f'(r2)r  +  £  ^-[f-((y-r)2)(p-r)]lvdu} 

where  r  =  0e(m,N),  and  in  deriving  (A. 6) ,  we  used  (A. 4),  and  the  condition 

|  | x ( 0)  -  u(a)  |  |  _<  6e(m,N)  which  ensures  that  R(t)  <_  R(0)  <_  r  (due  to  (15)). 

However,  (13)  -  (15)  also  bound  the  r.h.s.  of  (A. 6)  for  f(d)  =  -k(-p-)  m,  and  give: 

0 


R(t)  <  -2kmd"j{  ( - - - Y  m  ■*  -  ( - - C  - Y  Y 

1  V6e(m,N)/  Ve(m,N) (l-GeOn.N))' 


(A. 7) 


Integrating  (A. 7),  and  using  the  fact  that  R(t)  >_  0,  we  obtain  R(t)  <_  0 
for  t  >  T,  where 


=  (egcm.^y581*2*  * 


0  J,  feqjooo) 

2kml  "VeECn.N) 


( 2m+ 1) ",  -  1 


However,  R(t)  £  0  implies  R(t)  =  0,  that  implies  x(t)  =  u^  for  t  >  T.  □ 

Proof  of  Theorem  2.  Consider  a  neighborhood  y  of  x(n)  with  Hamming  distance  one, 


then  either  (A)  j  |y-u^  |  |  =  [  ]x(n)  -u^  |  |  -  or  (B) 
and  for  x(n)  ^  u^  ,  there  are  exactly  Ai|  |x(n)  -u^  |  |  : 


U-u^ 


|x(n) -u 


>  1  neighbors  of  type  (A)  . 


The  theorem  is  thus  a  direct  consequence  of  the  following  claim  (when  (17)  holds) 


Claim:  For  any  x  such  that  |  |x-u 


<  26o  then  V(y)  <  V(x)  for  neighbors  y  of 


type  (A)  ,  and  V(y)  >_  V(x)  for  neighbors  y  of  type  (B)  . 

Proof  of  the  Claim.  N’ote  that  for  any  1  £  j  <_  K,  -  T-  <_  I  |y-u^  |  j  - 
|  Ix-u*-'^  |  |  <_  (jr,  since  y  and  x  differ  only  in  one  component.  Furthermore,  it  is 
enough  to  show  that  V(v)  >.  V(x)  for  neighbors  y  of  type  (B) ,  with  strict  inequal¬ 
ity  for  ||x-u^||  <  26o,  since  if  y  is  of  type  (A)  w.r.t.  x,  then  |  jy-u^  |  |  <_  1 
as  well,  and  x  is  of  type  (B)  w.r.t.  v. 

Since  the  function  f(d)  is  monotonical ly  increasing,  and  y  is  of  type  (B) : 


f(| |y-u(j) ||2)  >  f ( C I |x-u(3) | |  -  |)2)  Vj  *  i 

f(lly-u(l)||2)  =  f((||x-u(i)||  +  f)2) 


(A.  9) 


v(y)  -  v(x)  > 


I  {(l|x-u( 
j  =  l  L 


2.  -2m 
NJ 


*  l>-21 


,(j)  |  u-2m 


(A.  1C 


But ,  | | x-u 


<  2&p  and  | | x-u 


>  ||uCi)-u(j) 


>  20(1-6), 


and  the  function  g(r)  =  r  “  ‘  -  (r+A)  ‘  is  a  monotonical  ly  decreasing  function, 

so  from  (A . 10)  : 


26 


V(y)  -  V(x)  >  |(:0c)'2ri1  -  (29d  ♦  |)’2n1} 

-  (K-1){(2p(1-S)  -  |)'2m  -  (2p(l-e))‘2irj  (A.  11) 

with  strict  inequality  whenever  ||x-u^J|  <  2c9.  To  complete  the  proof,  we  just 
have  to  show  that  the  r.h.s.  of  (A. 11)  is  non-negative  whenever  (1?)  holds.  This 
is  easily  shown  by  a  simple  rearrangement  of  (A. 11)  using  9  <_  1/2  and  (1-9)  _>  1/2. 

a 
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